GP-Fileprints: File Types Detection Using Genetic Programming
نویسندگان
چکیده
We propose a novel application of Genetic Programming (GP): the identification of file types via the analysis of raw binary streams (i.e., without the use of meta data). GP evolves programs with multiple components. One component analyses statistical features extracted from the raw byte-series to divide the data into blocks. These blocks are then analysed via another component to obtain a signature for each file in a training set. These signatures are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar signatures. Each cluster is then labelled according to the dominant label for its members. Once a program that achieves good classification is evolved it can be used on unseen data without requiring any further evolution. Experimental results show that GP compares very well with established file classification algorithms (i.e., Neural Networks, Bayes Networks and J48 Decision Trees).
منابع مشابه
Estimation of Discharge over the Submerged Compound Sharp-Crested Weir using Artificial Neural Networks and Genetic Programming
Truncated sharp crested weirs are used to measure flow rate and control upstream water surface in irrigation canals and laboratory flumes. The main advantages of such weirs are ease of construction and capability of measuring a wide range of flows with sufficient accuracy. Artificial neural networks (ANNs) and genetic programming (GP) have recently been used for estimation of hydraulic data. In...
متن کاملRELATIONSHIP OF TENSILE STRENGTH OF STEEL FIBER REINFORCED CONCRETE BASED ON GENETIC PROGRAMMING
Estimating mechanical properties of concrete before designing reinforced concrete structures is among the design requirements. Steel fibers have a considerable effect on the mechanical properties of reinforced concrete, particularly its tensile strength. So far, numerous studies have been done to estimate the relationship between tensile strength of steel fiber reinforced concrete (SFRC) and ot...
متن کاملApplication of Genetic Programming to Modeling and Prediction of Activity Coefficient Ratio of Electrolytes in Aqueous Electrolyte Solution Containing Amino Acids
Genetic programming (GP) is one of the computer algorithms in the family of evolutionary-computational methods, which have been shown to provide reliable solutions to complex optimization problems. The genetic programming under discussion in this work relies on tree-like building blocks, and thus supports process modeling with varying structure. In this paper the systems containing amino ac...
متن کاملFrequency domain analysis of transient flow in pipelines; application of the genetic programming to reduce the linearization errors
The transient flow analyzing by the frequency domain method (FDM) is computationally much faster than the method of characteristic (MOC) in the time domain. FDM needs no discretization in time and space, but requires the linearization of governing equations and boundary conditions. Hence, the FDM is only valid for small perturbations in which the system’s hydraulics is almost linear. In this st...
متن کاملEstimating scour below inverted siphon structures using stochastic and soft computing approaches
This paper uses nonlinear regression, Artificial Neural Network (ANN) and Genetic Programming (GP) approaches for predicting an important tangible issue i.e. scours dimensions downstream of inverted siphon structures. Dimensional analysis and nonlinear regression-based equations was proposed for estimation of maximum scour depth, location of the scour hole, location and height of the dune downs...
متن کامل